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The Journal of Agricultural Education (JAE) requires authors to follow the guidelines stated in the 
Publication Manual of the American Psychological Association [APA] (2009) in preparing research 
manuscripts, and to utilize accepted research and statistical methods in conducting quantitative research 
studies. The APA recommends the reporting of effect sizes in quantitative research, when appropriate. 
JAE now requires the reporting of effect size when reporting statistical significance in quantitative 
manuscripts. The purposes of this manuscript are to describe the research foundation supporting the 
reporting of effect size in quantitative research and to provide examples of how to calculate effect size for 
some of the most common statistical analyses utilized in agricultural education research. 
Recommendations for appropriate effect size measures and interpretation are included. The assumptions 
and limitations inherent in the reporting of effect size in research are also incorporated. 
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Introduction 

“At present, too many research results in 
education are blatantly described as 
significant, when they are in fact trivially 
small and unimportant” (Carver, 1993, p. 
287). 

The term ‘Effect Size’ describes indices that 
measure the magnitude of treatment effects. 
‘Effect Size’ differs from significance tests 
because it focuses on the meaning of the results 
and enables comparison between or among 
studies which further enables researchers to 
judge the practical significance of quantitative 
research results. “For the reader to appreciate 
the magnitude or importance of a study’s 
findings, it is almost always necessary to include 
some measure of effect size in the Results 
section” (APA, 2009, p. 34). Additionally, 
effect size encourages a meta-analysis 
perspective thereby leading to the ability to 
compare between studies and demonstrate 


repeatability of studies. Starting in January, 
2010, the Journal of Agricultural Education 
(JAE) requires that “Authors MUST report 
effect sizes when reporting statistical 
significance for quantitative data analyses” 
(American Association for Agricultural 
Education [AAAE], 2010, p. 1). 

Effect size has not been consistently 
reported in Journal of Agricultural Education 
(JAE) manuscripts in the 12 issues published in 
the last three years (2007-2009). The correct 
reporting of effect size has actually declined 
from a similar period 10 years ago (1997-1999) 
(Table 1). Out of the 119 manuscripts published 
in the last three years (2007-2009), effect size 
should have been reported in 55 manuscripts. 
An analysis of these 119 manuscripts revealed 
that effect size was reported correctly in 17 
(30.9%) manuscripts and either not reported at 
all or reported incorrectly or inappropriately in 
the remaining 38 (69.1%) manuscripts. The data 
in Table 1 show that the correct reporting of 
effect size has declined from 36.8% in 1997— 
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1999 to 30.9% in 2006-2009. This comparison was conducted for journals throughout the social 

addressed JAE; however, it is probable that sciences, 

similar results would be obtained if this analysis 

Table 1 


Comparison of Effect Size Reporting and Interpretation in the Journal of Agricultural Education between 
1997-1999 and 2007-2009 


Year range 

Total 

articles 

published 

(AO 

Number of articles for 
which effect size should 
have been reported 

( 77 ) 

Effect size correctly 
reported and 
interpreted 

( 77 /% a ) 

Effect size not reported, 
or incorrectly reported or 
inteipreted 

( 77 /% a ) 

1997-1999 

87 

38 

14/36.8% 

24/63.2% 

2007-2009 

119 

55 

17/30.9% 

38/69.1% 


a The n and % reported is based on the number of articles for which effect size should have been reported, 
as shown in column 3. 


As discussed below, the reporting of effect 
size is an issue that has been strongly 
recommended by numerous researchers and 
journals. This article will focus on suggestions 
for reporting and interpreting effect size for 
inferential statistics commonly reported in 
manuscripts published in JAE. This manuscript 
is an updated and refocused version of a 
previously published manuscript (Kotrlik & 
Williams, 2003). 

Theoretical Base 

The concept of ‘effect size’ was first 
introduced as early as 1901 (Pearson). Interest 
in reporting ‘effect size’ has risen substantially 
in the last few decades and has become even 
more widespread in the research literature in 
recent years. Kirk (1996) cited the need to 
report effect size and an APA Task Force 
indicated that researchers should “Always 
provide some effect-size estimate when 
reporting a p value. . . . reporting and 
interpreting effect sizes in the context of 
previously reported effects is essential to good 
research” (Wilkinson & APA Task Force on 
Statistical Inference, 1999, p. 599). 

Some common definitions for effect size 
include: 

• “. . . a measure of the degree of difference or 

association deemed large enough to be of 
‘practical significance”. . . (Cohen, 1962); 
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• “. . . the degree to which the phenomenon is 
present in the population . . . (Cohen, 1988, 
p- 9 ); 

• “. . . estimate of the degree to which the 
phenomenon being studied . . . exists in the 
population . . .” (Hair, Black, Babin, 
Anderson, & Tatham, 2006, p. 2); and 

• “. . . magnitude or importance of a study’s 
findings . . .” (American Psychological 
Association, 2009, p. 34). 

• Maxwell and Delaney (1990) indicated that 
two categories of measures of effect size are 
commonly utilized in the literature: 
measures of effect size (according to group 
mean differences), and measures of strength 
of association (according to variance 
accounted for). 

The Importance of Effect Size 

In 1901, Karl Pearson stated that statistical 
significance must be supplemented because it 
provides the reader with only a partial 
explanation of the importance or significance of 
the results (Kirk, 1996). Subsequently, Fisher 
(1925) proposed that, when reporting research 
findings, researchers should present measures of 
the strength of association or correlation ratios. 
Since these early observations, many researchers 
have promoted the use of effect size to 
complement or even replace statistical 
significance testing results, allowing the reader 
to interpret the results presented as well as 
providing a method of comparison of results 
between or among studies (Cohen, 1965, 1990, 
1994; Hair et al., 2006; Kirk, 1996, 2001; 
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Thompson, 1998, 2002). Effect size can also be 
valuable in characterizing the degree to which 
sample results diverge from the null hypothesis 
(Cohen, 1988, 1994). Therefore, reporting 
effect size allows a researcher to judge the 
magnitude of the differences between or among 
groups, which increases the researcher’s 
capability to compare current research results to 
previous research and judge the practical 
significance of the results derived. 

JAE requires authors to prepare their 
manuscripts in accordance with the Publication 
Manual of the American Psychological 
Association (2009) which provides guidance for 
authors regarding effect size (AAAE, 2010). 
The emphasis on effect size by the APA was 
preceded by an APA Task Force’s earlier 
recommendation that strongly encouraged 
researchers to report effect sizes such as Cohen’s 
d, Cohen’s / eta 2 , or adjusted R 2 (Wilkinson & 
APA Task Force on Statistical Inference, 1999, 
p. 599). 

Ample support for APA’s recommendations 
regarding the reporting of effect size is available 
in the research literature. One example is Fan 
(2001) who indicated that good research 
presented both statistical significance testing 
results and effect sizes. Baugh (2002) stated that 
“Effect size reporting is increasingly recognized 
as a necessary and responsible practice” (p. 
255). It is the researcher’s duty to adhere to 
stringent analytical and reporting methods in 
order to ensure the proper inteipretation and 
application of research results. The reporting of 
effect size is a part of this duty. 

Why Report Effect Size in Addition to Statistical 
Significance? 

Reporting effect size in addition to reporting 
statistical significance is important because 
many researchers assume a p value provides an 
indicator of both statistical and practical 
significance. Attention to the misuse of 
statistical testing spans the research literature 
from Cohen (1988) to Kline (2009) and 
Thompson (2009), and will likely continue for 
years to come. Misuse begins as early in the 
study design as selection of the alpha value and 
continues through the interpretation of the 
results of the selected statistical test. 

Researchers set an alpha value (or 
probability of Type I error) based on the amount 
of risk one is willing to accept that one will 


incorrectly reject the null hypothesis; as well as 
previous research in their field (Hair et al., 2006; 
Mendenhall, Beaver, & Beaver, 1999). 
Generally, Type II error is not considered, 
thereby risking the practical significance 
implications of one’s study. Once the statistical 
test is conducted, Nickerson (2000), stated that 
many researchers misinterpret the results by 
believing that a small value of p means a 
treatment effect of large magnitude, and that 
statistical significance means theoretical or 
practical significance. A researcher must 
remember that “Statistical significance testing 
does not imply meaningfulness,” (Olejnik & 
Algina, 2000, p. 241); and that statistical 
significance testing determines the probability of 
obtaining the sampling outcome by chance, 
while it is effect size that addresses practical 
significance or meaningfulness (Fan, 2001). 
Additionally, Kirk (2001) reminds researchers of 
the reliance of statistical significance on sample 
size, but notes that effect size assists one in the 
interpretation of results and thereby making 
trivial effects harder to ignore, and furthering the 
ability of a researcher to decide whether results 
are practically significant (Kirk, 2001). 

These arguments against inteipretation of a 
p value to denote practical significance lead 
researchers to recognize the value of including 
effect size measures in statistical testing 
execution, inteipretation, and reporting. As an 
example of the application of these arguments, 
the reporting of effect size in a recent 
agricultural education study would have led the 
researchers to use effect size in addition to the 
results of a statistically significant t-test as the 
basis for drawing their conclusions. The authors 
compared students’ perceptions of agriculture in 
schools by whether the school had an agriculture 
program. The data in Table 2 show that even 
though the t-test was statistically significant (t = 
2.00, p = .046, df = 1,767), Cohen’s effect size 
value (d = .10) did not meet Cohen’s minimum 
standard (d > .20) to be called a “small” effect 
size. This additional information may have led 
the researchers to conclude that the differences 
have negligible practical significance and no 
substantial recommendations may be appropriate 
based on the results of this t-test. 

As can be seen from the research literature 
regarding effect size and the example above, it is 
the researcher’s responsibility to select the most 
appropriate sample size statistical test(s), 
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properly set the alpha value, properly select the 
appropriate effect size measure, determine the 
most appropriate interpretation method, clearly 
report all results, and base conclusions and 
recommendations on the overall results (i.e., the 
“big picture” based on BOTH the p value 
interpretation AND effect size interpretation). 
These actions increase the ability to determine 


not only statistical significance but also practical 
significance, further adding to the ability of the 
researcher to determine whether the outcome 
may or may not have occurred by chance 
(Capraro 2004; Carver, 1993; Fagley & 
McKinney, 1983; Fan, 2001; Kline (2009); 
Robinson & Levin, 1997; Shaver, 1993; 
Thompson, 1996, 1998, 2009). 


Table 2 


Comparison of General Agriculture Perceptions by Students in Schools with an Agriculture Program 
versus Students in Schools with No Agriculture Program (N = 1,953) _ 



Agriculture 

No agriculture 




program 

program 




M 

SD 

M 

SD 

t df P 

Cohen’s d 

Student perceptions of 
agriculture 

20.11 

2.68 

19.86 

2.55 

2.00 1767 .046 

.10 


Note. The data in this table were taken from a recently published agricultural education research 
manuscript. Adapted with permission. 


Cautions Applicable to Effect Size Interpretation 

Just as a researcher must be cautious 
regarding the violation of assumptions when 
computing a parametric test statistic, one must 
recognize that effect size measures are also 
sensitive to violations of assumptions - 
specifically, non-normality and heterogeneity 
(Leech & Onwuegbuzie, 2002). With this in 
mind, researchers must first ensure that the 
assumptions of the statistical test are satisfied 
when conducting their analyses; then, carefully 
select the most appropriate effect size measure. 
The researcher should select effect size 
measures after testing for the violation of 
assumptions and before the actual execution of 
parametric or non-parametric tests. Those 
researchers using non-parametric test statistics 
should ensure the effect size measure selected is 
not a parametric effect size measure (Leech & 
Onwuegbuzie, 2002). Additionally, researchers 
should be cautious completing analyses with 
small sample sizes as these have the potential to 
influence the results of effect size calculation. 

The next caution in reporting effect size is 
interpretation. In his initial publication that 
proposed an interpretation of effect size 
measures, Cohen (1988) did not anticipate such 
wide utilization and acceptance. However, 
Cohen’s book and other publications such as 
Davis (1971) and Hinkle, Wiersma, and Jurs 
(1979), have permeated the education literature 
from both a referential and debate stance; and 


provide points of reference to assist a researcher 
in deciding how to interpret the magnitude of the 
results of their study. For the purpose of this 
discussion, methods for determining effect size 
for the most commonly used statistical analyses 
are presented below in the “Measures of Effect 
Size” section of this manuscript. The authors 
note, however, that one should not rely solely on 
this manuscript (or any one publication) as each 
provides only a limited perspective of the 
appropriate use of effect sizes. Additionally, 
effect sizes presented are generally available in 
current statistical programs such as SAS and 
SPSS and on the Internet; other effect size 
measures are available for use but must be 
calculated by hand. 

Effect Size Measures 

Effect size measures have been divided into 
two families (d family and r family) by 
Rosenthal (1994). This division assists in 
understanding of appropriate application of 
effect size measures. The d family is most often 
associated with variations on standardized mean 
differences, while the r family is expressed in 
terms of the correlation coefficient (r or r 2 ). 
With some parametric test statistics, both may 
be used as appropriate effect size estimates. For 
the purposes of this manuscript, those effect size 
measures commonly applied to the parametric 
tests most frequently published in JAE will be 
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discussed. These span the r and d families of 
indices. 

The previously mentioned analysis of recent 
JAE articles revealed the most commonly used 
statistical tests. These included multiple 
regression, independent ?-test, ANOVA, 
bivariate correlation, ANCOVA, Chi-square, 
and factor analysis (Table 3). Other analyses 
used that are common to most social science 
research include paired t-test and point-biserial 


correlation. Each of these statistical tests has 
associated effect size measures that are most 
appropriate. Possible selections of effect size 
measures are presented in Table 3, and their use 
and inteipretation is presented in the discussion 
below. Please note that although some effect 
size measures are the same for different tests 
(e.g., Cohen’s d), the denominator is calculated 
differently as discussed following Table 3. 


Table 3 


Statistical Tests Reported in the Journal of Agricultural Education 


Statistical test 

Reported use in JAE 
2007-2009 

Potential effect size measures 3 

Multiple regression 

7 

Multiple regression coefficient (R 2 ) 

Independent t test b 

6 

Cohen’s d, Hedges’s g, Glass’s delta 

ANOVA 

5 

Cohen’s/, Omega squared 

Bivariate correlation 

4 

Correlation coefficient (Pearson’s r, Spearman’s rho) 

ANCOVA 

3 

Cohen’s f Omega squared 

Chi square 

2 

Phi coefficient (2x2 table), Cramer’s V (larger than 
2x2 table) 

Paired t test b 

1 

Cohen’s d 

Point biserial correlation 

1 

Correlation coefficient (point bi-serial) 


“See Table 2 for potential effect size descriptors. b The formulas used to calculate Cohen’s d differ for 
paired and inferential t-tests (Cohen, 1988). 


Independent t-tests 

Cohen’s d statistic is a common measure to 
estimate effect size for independent samples 
t-tests (Cohen, 1988). If the statistical analysis 
program utilized by the researcher does not 
calculate Cohen’s d, one will need the following 
formulas to calculate the pooled standard 
deviation and the Cohen’s d statistic: 

Pooled standard deviation = sqrt [((/?]—1 ).V | 2 
+ (n 2 -\)s 2 2 ) / ((«:-! )+( (« 2 -l))] 

Then, Cohen’s d = Difference between 
sample means / Pooled standard deviation 

For formula assistance, Larry Beckley 
provides a web-based effect size calculator 
(Cohen’s d) for independent samples t-tests 
which may be found at 
http://www.uccs.edu/~faculty/lbecker/ 
index.html. Hedges’s g and Glass’s delta are 


also commonly referenced measures to estimate 
effect size for independent t-tests (Kline, 2009; 
Rosenthal & Rosnow, 1991). Hedge’s g is most 
appropriate for very small samples. 

Paired t-tests 

Cohen’s d is also applicable to estimate 
effect size for paired samples t-tests. Paired 
sample t-tests compare group means when the 
two groups are correlated in various research 
designs (e.g., matched pairs, repeated measures, 
before-after). The denominator should be 
calculated using the original standard deviations 
(Dunlop, Cortina, Vaslow, & Burke, 1996). 
Researchers should be cautious when using 
web-based effect size calculators to ensure that 
the calculator they select is appropriate for 
paired t-tests. 
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Analysis of Variance and Analysis of 
Covariance 

Both Cohen’s / (Cohen, 1988) and Omega 
squared (co 2 ) are common methods of reporting 
effect sizes for analyses of variance when 
utilizing ANOVA and ANCOVA. Both provide 
an estimate of the proportion of variance 
explained by the categorical variable, with 
Cohen’s / estimating the proportion of variance 
explained for the sample, while Omega squared 
estimates the proportion of variance explained 
for the population. To calculate Cohen’s f 
calculate Eta squared ( r / 2 ) first: 

V SSiietwccn/SSioial 

Then, use the following formula to calculate 
Cohen’s /: 

Square root of (rj 2 /l- rj 2 ) 

Omega squared (co 2 ) can be calculated as 
follows: 

2 ^ ^Between (k — l)MS within 

co—- 

S^Total + MS\yithin 

In the formula for co 2 , k = number of groups. 
The sum of square and mean square information 
is provided by most statistical programs. 

Note : If assumptions of equal sample size 
and homogeneity of variance are violated, 
effect size will be overestimated (Volker, 
2006); therefore, caution should be used in 
interpreting and reporting the effect size 
measure. Use the descriptors in Table 4 to 
interpret these coefficients. 


Correlations 

Perhaps the simplest measures and reporting 
of effect sizes exist for correlations. The 
correlation coefficient itself is a measure of 
effect size. The most commonly used statistics 
for parametric correlations are Pearson’s r and 
Spearman’s rho (r s ); and the most commonly 
used statistic for nonparametric correlations is 
the point biserial correlation (r pb ). The practical 
importance of correlation coefficients must be 
interpreted descriptors for correlation 
coefficients. Several sets of descriptors for 
correlation coefficients are presented in Table 4. 

Non-parametric Measures 

The Phi coefficient is commonly used to 
estimate the magnitude of association in 2 x 2 
contingency tables. Phi is a Pearson product- 
moment coefficient calculated on two nominal- 
dichotomous variables for which the categories 
of both variables have been coded 0 and 1. 
Cramer’s V is commonly used to describe the 
magnitude of association between categorical 
variables for a contingency table larger than 2 x 
2. SPSS, SAS and other statistical analysis 
programs will calculate either the Phi or 
Cramer’s V coefficients. Use the descriptors in 
Table 4 to interpret these coefficients. 

Regression 

An effect size measure for simple or 
multiple regression is the regression coefficient, 
R 1 . Most statistical analysis programs calculate 
this coefficient which represents the proportion 
of the dependent variable’s variance that is 
explained by the independent variable(s). The 
effect size of the calculated R 2 may be 
interpreted using the set of descriptors proposed 
by Cohen (1988, see Table 4). 
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Table 4 

Descriptors for Reporting and Interpreting Effect Size in Quantitative Research 



Effect size 



Reference 

statistic 

Values 

Interpretation of effect size 

Cohen, 


.10 

Small effect size 

1988 


.30 

Medium effect size 


Cramer’s Phi or 

.50 

Large effect size 

Rea & 

Cramer’s V for 

.00 and under .10 

Negligible association 

Parker, 

nominal data 

.10 and under .20 

Weak association 

1992 


.20 and under .40 

Moderate association 



.40 and under .60 
.60 and under .80 

Relatively strong association 

Strong association 



.80 and under 1.00 

Very strong association 

Cohen, 

Cohen’s d for 

.20 

Small effect size 

1988 

independent 

t-tests 

.50 

.80 

Medium effect size 

Large effect size 


Cohen’s/for 

.10 

Small effect size 


ANOVA and 

.25 

Medium effect size 


ANCOVA 

.40 

Large effect size 

Cohen, 

1988 

R 2 for multiple 
regression 

.0196 

.1300 

.2600 

Small effect size 

Medium effect size 

Large effect size 

Keppel, 

1991 

Omega squared 
(co 2 ) for 

.01 

.06 

.15 

Small effect 

Medium effect 

Large effect 

Kirk, 1996 

ANOVA, 

.010 

Small effect size 


ANCOVA 

.059 

Medium effect size 



.138 

Large effect size 

Davis, 

1971“ 


.70 or higher 
.50 to .69 

Very strong association 

Substantial association 



.30 to .49 

Moderate association 



.10 to .29 

Low association 



.01 to .09 

Negligible association 

Hinkle, 

Wiersma, 

& Jurs, 
1979“ b 

Correlation 

coefficients 

.90 to 1.00 
.70 to .90 
.50 to .70 
.30 to .50 

Very high correlation 

High correlation 

Moderate correlation 

Low correlation 



.00 to .30 

Little if any correlation 

Hopkins 

(1997)“ 


.90 to 1.00 

Nearly, practically, or almost: perfect, 
distinct, infinite 



.70 to .90 
.50 to .70 
.30 to .50 

Very large, very high, huge 

Large, high, major 

Moderate, medium 



.10 to .30 

Small, low, minor 



.00 to .10 

Trivial, very small, insubstantial, tiny, 




practically zero 


Note. Table adapted from Kotrlik & Williams, 2003. Adapted with permission. 

“Several authors have published guidelines for interpreting the magnitude of correlation coefficients. 
b Note the more stringent nature of these descriptors when compared to Davis (1971) and Hopkins (1997). 
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Discussion 

This manuscript does not identify all 
available effect size measures. The effect size 
measures selected for presentation address only 
the most commonly used statistical analyses in 
manuscripts published in the JAE. It is not 
difficult to calculate and report effect sizes and 
the references provided at the end of this article 
will serve as a good starting point for authors 
attempting to identify appropriate measures of 
effect size for other types of statistical analyses. 

The guidelines referenced in this article for 
the interpretation of effect sizes should be taken 
as general guidelines to follow if previous 
findings and knowledge of the area studied do 
not exist. If previous findings or knowledge of 
the area studied exist, they should be used in 
consort with the statistical significance results 


and the calculated effect size to interpret the 
practical importance of the findings. Thompson 
(2000) supported these cautionary words by 
stating, “. . . it must be emphasized that if we 
mindlessly invoke Cohen’s rule of thumb, 
contrary to his strong admonitions, in place of 
the equally mindless consultation of p value 
cutoffs such as .05 and .01, we are merely 
electing to be thoughtless in a new metric” 
(Thompson, 2000, |18) 

The authors’ main purpose for writing this 
article was their hope that JAE authors will 
improve the reporting of their research by 
including and inteipreting effect size measures 
when appropriate. It is the hope of the authors 
that this article will serve as a useful resource for 
agricultural education researchers and that the 
reporting of effect sizes will strengthen the 
quantitative research articles published in JAE. 
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